26 results found.
Written
Corpus,
Language Type:
Multilingual
Languages:
Afrikaans Albanian Amharic Arabic Aragonese Armenian Assamese Azerbaijani Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Catalan Central Khmer Chinese Croatian Czech Danish Dutch Dzongkha English Esperanto Estonian Finnish French Gaelic Galician Georgian German Greek Gujarati Hausa Hebrew Hindi Hungarian Icelandic Igbo Indonesian Irish Italian Japanese Kannada Kazakh Kinyarwanda Korean Kurdish Kyrgyz Latvian Limburgan Lithuanian Macedonian Malagasy Malay Malayalam Maltese Marathi Mongolian Nepali Northern Sami Norwegian Norwegian Bokmål Norwegian Nynorsk Occitan Oriya Panjabi Pashto Persian Polish Portuguese Romanian Russian Serbian Serbo-Croatian Sinhala Slovak Slovenian Spanish Swedish Tajik Tamil Tatar Telugu Thai Turkish Turkmen Uighur Ukrainian Urdu Uzbek Vietnamese Walloon Welsh Western Frisian Xhosa Yiddish Yoruba Zulu
Availability:
Freely Available
License:
Size:
55 million sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
-
Paper track:Long/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Biao Zhang | the open parallel corpus (OPUS) | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Egyptian Arabic English French German Hindi Iranian Persian Japanese Korean Mandarin Chinese Russian Spanish Tamil Vietnamese
Availability:
From Owner
License:
LDC
Size:
46 hours Production Status:
Existing-used
Use:
Language Identification
-
Paper title:Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2003 NIST Language Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Dari English German Hindi Iranian Persian Japanese Korean Mandarin Chinese Persian Russian Spanish Standard Arabic Tamil Thai Vietnamese Yue Chinese
Availability:
From Owner
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Language Identification
-
Paper title:Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2007 NIST Language Recognition Evaluation Test Set | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Bengali Czech Dari English Hindi Lao Mandarin Chinese Mesopotamian Arabic Moroccan Arabic North Levantine Arabic Panjabi Persian Polish Pushto Russian Slovak South Levantine Arabic Spanish Standard Arabic Tamil Thai Turkish Ukrainian Urdu
Availability:
From Owner
License:
LDC
Size:
204 hours Production Status:
Existing-used
Use:
Language Identification
-
Paper title:Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2011 NIST Language Recognition Evaluation Test Set | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
Arabic Catalan Chinese Dutch Estonian French German Indonesian Italian Japanese Latvian Mongolian Persian Portuguese Russian Slovenian Spanish Swedish Tamil Turkish Welsh
Availability:
Freely Available
License:
CC0
Size:
2880 hoursProduction Status:
Newly created-in progress
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:CoVoST 2 and Massively Multilingual Speech Translation
-
Paper track:12.1 Spoken machine translation/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Juan Pino | CoVoST 2 | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bengali Cantonese Madarin Chinese Min Nan Chinese Russian Spanish Tamil Thai Urdu Wu Chinese
Availability:
From Owner
License:
LDC
Size:
118 hoursProduction Status:
Existing-used
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2007 NIST Language Recognition Evaluation Supplemental Training Set | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Arabic English Farsi French German Hindi Japanese Korean Mandarin Russian Spanish Tamil Vietnamese
Availability:
From Owner
License:
LDC
Size:
46 hoursProduction Status:
Existing-used
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2003 NIST Language Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bengali Dari English German Hindi Iranian Persian Japanese Korean Mandarin Chinese Persian Russian Spansih Standard Arabic Tamil Thai Vietnamese Yue Chinese
Availability:
From Owner
License:
LDC
Size:
66 hoursProduction Status:
Existing-used
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2007 NIST Language Recognition Evaluation Test Set | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
English Hindi Punjabi Tamil
Availability:
From Data Center(s)
License:
TDIL, Government of India
Size:
600000 sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Issues in chunking parallel corpora: mapping Hindi-English verb group in ILCI
-
Paper track:Short Paper
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Esha Banerjee | Jawaharlal Nehru University | IN |
| Author 2 | Akanksha Bansal | Jawaharlal Nehru University | None |
| Author 3 | Girish Jha | Jawaharlal Nehru University, New Delhi | IN |
| Main Contact | Esha Banerjee | Jawaharlal Nehru University | None |
Documentation:
<Not Specified>
Written
Corpus,
Language Type:
Multilingual
Languages:
Bengali Hindi Tamil Telugu
Availability:
Freely Available
License:
OpenSource
Size:
2 MByte Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:No more beating about the bush : A Step towards Idiom Handling for Indian Language NLP
-
Paper track:Written
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Ruchit Agrawal | FBK, Trento | IT |
| Author 2 | Vighnesh Chenthil Kumar | IIIT Hyderabad | IN |
| Author 3 | Vigneshwaran Muralidaran | International Institute of Information Technology Hyderabad | IN |
| Author 4 | Dipti Sharma | IIIT, Hyderabad | IN |
| Main Contact | Ruchit Agrawal | FBK, Trento | None |
Documentation:
Available in English




